[1]:
%run ../initscript.py
HTML("""
<div id="popup" style="padding-bottom:5px; display:none;">
<div>Enter Password:</div>
<input id="password" type="password"/>
<button onclick="done()" style="border-radius: 12px;">Submit</button>
</div>
<button onclick="unlock()" style="border-radius: 12px;">Unclock</button>
""")
# <a href="#" onclick="code_toggle(this); return false;">show code</a>
[1]:
[2]:
%run loadfuncs.py
from ipywidgets import *
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
toggle()
[2]:
Seasonal ARIMA (SARIMA) Models¶
The ARIMA model does not support seasonality. If the time series data has defined seasonality, then we need to perform seasonal differencing and SARIMA models.
Seasonal differencing is similar to regular differencing, but, instead of subtracting consecutive terms, we subtract the value from previous season.
The model is represented as SARIMA\((p,d,q)x(P,D,Q)\), where - \(D\) is the order of seasonal differencing - \(P\) is the order of seasonal autoregression (SAR) - \(Q\) is the order of seasonal moving average (SMA) - \(x\) is the frequency of the time series.
[3]:
fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)
# Usual Differencing
axes[0].plot(df_drink.sales, label='Original Series')
axes[0].plot(df_drink.sales.diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)
# Seasinal Differencing
axes[1].plot(df_drink.sales, label='Original Series')
axes[1].plot(df_drink.sales.diff(4), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drink Sales', fontsize=16)
plt.show()
toggle()
[3]:
As we can clearly see, the seasonal spikes is intact after applying usual differencing (lag 1). Whereas, it is rectified after seasonal differencing.
Build SARIMA Model¶
Find optimal SARIMA for house sales:
[4]:
sarima_house = pm.auto_arima(df_house.sales,
start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=4,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
sarima_house.summary()
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=3069.410, BIC=3087.760, Fit time=0.433 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 0, 4); AIC=3317.294, BIC=3324.633, Fit time=0.020 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 0, 4); AIC=3167.461, BIC=3182.140, Fit time=0.318 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=3242.302, BIC=3256.981, Fit time=0.270 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=3068.582, BIC=3090.601, Fit time=0.581 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 0, 4); AIC=3144.942, BIC=3163.292, Fit time=0.448 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 2, 4); AIC=3068.586, BIC=3094.275, Fit time=1.223 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 0, 4); AIC=3236.794, BIC=3251.474, Fit time=0.174 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(2, 1, 2, 4); AIC=3071.836, BIC=3101.195, Fit time=1.130 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=3235.957, BIC=3254.306, Fit time=0.431 seconds
Fit ARIMA: order=(2, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=3070.496, BIC=3096.185, Fit time=0.634 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 1, 4); AIC=3090.901, BIC=3109.250, Fit time=0.456 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(1, 1, 1, 4); AIC=3070.548, BIC=3096.237, Fit time=0.793 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(1, 1, 1, 4); AIC=3313.310, BIC=3327.989, Fit time=0.352 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(1, 1, 1, 4); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(2, 1, 1, 4); AIC=3069.171, BIC=3094.860, Fit time=0.674 seconds
Total fit time: 7.953 seconds
[4]:
| Dep. Variable: | y | No. Observations: | 294 |
|---|---|---|---|
| Model: | SARIMAX(1, 0, 1)x(1, 1, 1, 4) | Log Likelihood | -1528.291 |
| Date: | Mon, 29 Apr 2019 | AIC | 3068.582 |
| Time: | 18:11:14 | BIC | 3090.601 |
| Sample: | 0 | HQIC | 3077.404 |
| - 294 | |||
| Covariance Type: | opg |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| intercept | 0.0002 | 0.129 | 0.002 | 0.998 | -0.254 | 0.254 |
| ar.L1 | 0.9997 | 0.143 | 6.994 | 0.000 | 0.720 | 1.280 |
| ma.L1 | -0.3049 | 0.053 | -5.765 | 0.000 | -0.409 | -0.201 |
| ar.S.L4 | -0.1031 | 0.056 | -1.848 | 0.065 | -0.212 | 0.006 |
| ma.S.L4 | -0.9982 | 0.647 | -1.542 | 0.123 | -2.267 | 0.271 |
| sigma2 | 2109.4165 | 1043.105 | 2.022 | 0.043 | 64.968 | 4153.865 |
| Ljung-Box (Q): | 68.41 | Jarque-Bera (JB): | 5.09 |
|---|---|---|---|
| Prob(Q): | 0.00 | Prob(JB): | 0.08 |
| Heteroskedasticity (H): | 0.55 | Skew: | -0.14 |
| Prob(H) (two-sided): | 0.00 | Kurtosis: | 3.59 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
Find optimal SARIMA for drink sales
[5]:
sarima_drink = pm.auto_arima(df_drink.sales,
start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=4,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
sarima_drink.summary()
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=809.758, BIC=820.230, Fit time=0.176 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 0, 4); AIC=848.072, BIC=852.261, Fit time=0.016 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(1, 1, 0, 4); AIC=813.490, BIC=821.867, Fit time=0.065 seconds
Fit ARIMA: order=(0, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=813.808, BIC=822.185, Fit time=0.143 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 1, 4); AIC=811.146, BIC=823.712, Fit time=0.381 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 0, 4); AIC=810.383, BIC=818.760, Fit time=0.135 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(0, 1, 2, 4); AIC=811.498, BIC=824.064, Fit time=0.210 seconds
Fit ARIMA: order=(1, 0, 1) seasonal_order=(1, 1, 2, 4); AIC=813.467, BIC=828.127, Fit time=0.301 seconds
Fit ARIMA: order=(2, 0, 1) seasonal_order=(0, 1, 1, 4); AIC=811.201, BIC=823.767, Fit time=0.296 seconds
Fit ARIMA: order=(1, 0, 0) seasonal_order=(0, 1, 1, 4); AIC=813.022, BIC=821.399, Fit time=0.139 seconds
Fit ARIMA: order=(1, 0, 2) seasonal_order=(0, 1, 1, 4); AIC=810.926, BIC=823.492, Fit time=0.338 seconds
Fit ARIMA: order=(0, 0, 0) seasonal_order=(0, 1, 1, 4); AIC=849.704, BIC=855.987, Fit time=0.069 seconds
Fit ARIMA: order=(2, 0, 2) seasonal_order=(0, 1, 1, 4); AIC=813.065, BIC=827.725, Fit time=0.284 seconds
Total fit time: 2.564 seconds
[5]:
| Dep. Variable: | y | No. Observations: | 64 |
|---|---|---|---|
| Model: | SARIMAX(1, 0, 1)x(0, 1, 1, 4) | Log Likelihood | -399.879 |
| Date: | Mon, 29 Apr 2019 | AIC | 809.758 |
| Time: | 18:11:17 | BIC | 820.230 |
| Sample: | 0 | HQIC | 813.854 |
| - 64 | |||
| Covariance Type: | opg |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| intercept | 121.9284 | 50.994 | 2.391 | 0.017 | 21.981 | 221.876 |
| ar.L1 | 0.4436 | 0.178 | 2.492 | 0.013 | 0.095 | 0.793 |
| ma.L1 | 0.5287 | 0.143 | 3.687 | 0.000 | 0.248 | 0.810 |
| ma.S.L4 | -0.2772 | 0.141 | -1.969 | 0.049 | -0.553 | -0.001 |
| sigma2 | 3.487e+04 | 6857.056 | 5.085 | 0.000 | 2.14e+04 | 4.83e+04 |
| Ljung-Box (Q): | 28.74 | Jarque-Bera (JB): | 0.83 |
|---|---|---|---|
| Prob(Q): | 0.91 | Prob(JB): | 0.66 |
| Heteroskedasticity (H): | 1.28 | Skew: | -0.29 |
| Prob(H) (two-sided): | 0.58 | Kurtosis: | 3.03 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
SARIMA Forecast¶
[6]:
sarima_forcast(sarima_house, df_house, 'sales', forecast_periods=24, freq='month')
[7]:
sarima_forcast(sarima_drink, df_drink, 'sales', forecast_periods=24, freq='quarter')
SARIMAX Model with Exogenous Variable¶
We have a SARIMA model if there is an external predictor, also called, “exogenous variable” built into SARIMA models. The only requirement to use an exogenous variable is that we need to know the value of the variable during the forecast period as well.
For the sake of demonstration, we use the seasonal index from the classical seasonal decomposition on the latest 3 years of data even though SARIMA already modeling the seasonality.
The seasonal index is a good exogenous variable for demonstration purpose because it repeats every frequency cycle, 4 quarters in this case. So, we always know what values the seasonal index will hold for the future forecasts.
[8]:
df_drink = add_seasonal_index(df_drink, 'sales', freq='quarter', model='multiplicative')
sarimax_drink = pm.auto_arima(df_drink[['sales']], exogenous=df_drink[['seasonal_index']],
start_p=1, start_q=1,
test='adf',
max_p=3, max_q=3, m=12,
start_P=0, seasonal=True,
d=None, D=1, trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
sarimax_drink.summary()
Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=727.512, BIC=739.103, Fit time=0.576 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=730.208, BIC=736.003, Fit time=0.023 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=726.746, BIC=736.405, Fit time=0.297 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 1, 1, 12); AIC=728.526, BIC=738.185, Fit time=0.300 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 1, 0, 12); AIC=732.167, BIC=739.895, Fit time=0.044 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(2, 1, 0, 12); AIC=728.728, BIC=740.319, Fit time=1.051 seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(1, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(1, 1, 0) seasonal_order=(2, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 1, 0, 12); AIC=724.752, BIC=732.479, Fit time=0.283 seconds
Fit ARIMA: order=(0, 1, 1) seasonal_order=(1, 1, 0, 12); AIC=726.648, BIC=736.307, Fit time=0.325 seconds
Fit ARIMA: order=(1, 1, 1) seasonal_order=(1, 1, 0, 12); AIC=728.330, BIC=739.921, Fit time=0.387 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(2, 1, 0, 12); AIC=726.757, BIC=736.416, Fit time=0.547 seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(1, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Fit ARIMA: order=(0, 1, 0) seasonal_order=(2, 1, 1, 12); AIC=nan, BIC=nan, Fit time=nan seconds
Total fit time: 3.865 seconds
[8]:
| Dep. Variable: | y | No. Observations: | 64 |
|---|---|---|---|
| Model: | SARIMAX(0, 1, 0)x(1, 1, 0, 12) | Log Likelihood | -358.376 |
| Date: | Mon, 29 Apr 2019 | AIC | 724.752 |
| Time: | 18:11:21 | BIC | 732.479 |
| Sample: | 0 | HQIC | 727.705 |
| - 64 | |||
| Covariance Type: | opg |
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| intercept | 5.4344 | 44.502 | 0.122 | 0.903 | -81.787 | 92.656 |
| x1 | -0.0406 | 3.01e+04 | -1.35e-06 | 1.000 | -5.9e+04 | 5.9e+04 |
| ar.S.L12 | -0.4401 | 0.151 | -2.922 | 0.003 | -0.735 | -0.145 |
| sigma2 | 7.164e+04 | 1.95e+04 | 3.671 | 0.000 | 3.34e+04 | 1.1e+05 |
| Ljung-Box (Q): | 52.27 | Jarque-Bera (JB): | 2.60 |
|---|---|---|---|
| Prob(Q): | 0.09 | Prob(JB): | 0.27 |
| Heteroskedasticity (H): | 1.70 | Skew: | 0.54 |
| Prob(H) (two-sided): | 0.28 | Kurtosis: | 2.74 |
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[9]:
sarimax_forcast(sarimax_drink, df_drink, 'sales', forecast_periods=24, freq='quarter')
The coefficient of x1 is small and p-value is large, so the contribution from seasonal index is negligible.